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1 Introduction 

Prices of commodities or assets produce what is called time-series. Different 
kinds of financial time-series have been recorded and studied for decades. 
Nowadays, all transactions on a financial market are recorded, leading to a 
huge amount of data available, either for free in the Internet or commer- 
cially. Financial time-series analysis is of great interest to practitioners as 
well as to theoreticians, for making inferences and predictions. Furthermore, 
the stochastic uncertainties inherent in financial time-series and the theory 
needed to deal with them make the subject especially interesting not only to 
economists, but also to statisticians and physicists [1]. While it would be a 
formidable task to make an exhaustive review on the topic, with this review 
we try to give a flavor of some of its aspects. 



2 Stochastic methods in time-series analysis 

The birth of physics as a science is usually associated with the study of me- 
chanical objects moving with negligible fluctuations, such as the motion of 
planets. However, this type of systems is not unique, especially at smaller 
scales where the interaction with the environment and its influence in the 
form of random fluctuations has to be taken into account. The main theoret- 
ical tool to describe the evolution of such systems is the theory of stochastic 
processes, which can be formulated in various ways: in terms of a Master equa- 
tion, Fokker-Planck type equation, random walk model, Langevin equation, 
or through path integrals. Some systems can present unpredictable chaotic be- 
havior due to dynamically generated internal noise. Either truly stochastic or 
chaotic in nature, noisy processes represent the rule rather than an exception, 
not only in condensed matter physics but in many fields such as cosmology, 
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geology, meteorology, ecology, genetics, sociology, and economics. In fact the 
first formulation of the random walk model and a stochastic process was given 
in the framework of an economic study [2, 3]. In the following we propose and 
discuss some questions which we consider as possible land-marks in the field 
of time series analysis. 

2.1 Time-series versus random walk 

What if the time-series were similar to a random walk? The answer is: It 
would not be possible to predict future price movements using the past price 
movements or trends. Louis Bachelier, who was the first one to investigate this 
issue in 1900 [2], reached the conclusion that "The mathematical expectation 
of the speculator is zero" and described this condition as a "fair game." 

In economics, if P(t) is the price of a stock or commodity at time t, then the 
"log-return" is defined as r T (t) — In P(i + r) — lnP(i), where r is the interval 
of time. Some statistical features of daily log-return are illustrated in Fig. 1, 
using the price time-series for the General Electric. The real empirical returns 
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Fig. 1. Price in USD (above), log-price (center) and log-return (below) plotted 
versus time for the General Electric during the period 1982-2000. 



are compared in Fig. 2 with a random time-series we generated using random 
numbers extracted from a Normal distribution with zero mean and unit stan- 
dard deviation. If we divide the time-interval r into N sub-intervals (of width 
At), the total log-return r T (t) is by definition the sum of the log-returns in 
each sub-interval. If the price changes in each sub-interval are independent 
(Fig. 2 above) and identically distributed with a finite variance, according 
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Fig. 2. Random time-series, 3000 time steps (above) and Return time-series of the 
S&P500 stock index, 8938 time steps (below). 

to the central limit theorem the cumulative distribution function F(r T ) con- 
verges to a Gaussian (Normal) distribution for large r. The Gaussian (Normal) 
distribution has the following properties: (a) the average and most probable 
change is zero; (b) the probability of large fluctuations is very low; (c) it is a 
stable distribution. The distribution of returns was first modeled for "bonds" 
[2] as a Normal distribution, 

P{r) = [V^aY 1 cxp(-r 2 /2cr 2 ) , 

where a 1 is the variance of the distribution. 

In the classical financial theories Normality had always been assumed, 
until Mandelbrot [4] and Fama [5] pointed out that the empirical return dis- 
tributions are fundamentally different. Namely, they are "fat-tailed" and more 
peaked compared to the Normal distribution. Based on daily prices in different 
markets, Mandelbrot and Fama found that F{r T ) was a stable Levy distribu- 
tion whose tail decays with an exponent a ~ 1.7. This result suggested that 
short-term price changes were not well-behaved since most statistical prop- 
erties are not defined when the variance does not exist. Later, using more 
extensive data, the decay of the distribution was shown to be fast enough to 
provide finite second moment. With time, several other interesting features of 
the financial data were unearthed. 

The motive of physicists in analyzing financial data has been to find com- 
mon or universal regularities in the complex time-series (a different approach 
from those of the economists doing traditional statistical analysis of financial 
data). The results of their empirical studies on asset price series show that the 
apparently random variations of asset prices share some statistical properties 
which are interesting, non-trivial, and common for various assets, markets, 
and time periods. These are called "stylized empirical facts". 
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2.2 "Stylized" facts 

Stylized facts are usually formulated using general qualitative properties of 
asset returns. Hence, distinctive characteristics of the individual assets are 
not taken into account. Below we consider a few ones from Ref. [6]. 

(i) Fat tails: Large returns asymptotically follow a power law F(r T ) ~ \r\ a , 
with a > 2. The values a = 3.01 ± 0.03 and a = 2.84 ± 0.12 are found 
for the positive and negative tail respectively [8]. An a > 2 ensures a 
well-defined second moment and excludes stable laws with infinite vari- 
ance. There have been various suggestions for the form of the distribution: 
Student's-t (Fig. 3), hyperbolic, normal inverse Gaussian, exponentially 
truncated stable, etc. but there no general consensus has been reached yet 




Fig. 3. S&P 500 daily return distribution and normal kernel density estimate. 
Distributions of log returns normalized by the sample standard deviation rising from 
the demeaned S&P 500 (circles) and from a Tsallis distribution of index q = 1.43 
(solid line). For comparison, the normal distribution q = 1 is shown (dashed line). 
Adapted from Ref. [7]. 



(ii) Aggregational Normality: As one increases the time scale over which 
the returns are calculated, their distribution approaches the Normal form. 
The shape is different at different time scales. The fact that the shape of 
the distribution changes with r makes it clear that the random process 
underlying prices must have non-trivial temporal structure. 

(iii) Absence of linear auto-correlations: The auto-correlation of log- 
returns, p(T) ~ (r T (t + T)r T (t)), rapidly decays to zero for r > 15 minutes 
[9] , which supports the "efficient market hypothesis" (EMH) , discussed in 



Financial time-series analysis: A brief overview 



5 



Sec. 2.3. When r is increased, weekly and monthly returns exhibit some 
auto-correlation but the statistical evidence varies from sample to sample, 
(iv) Volatility clustering: Price fluctuations are not identically distributed 
and the properties of the distribution, such as the absolute return or 
variance, change with time. This is called time-dependent or "clustered 
volatility". The volatility measure of absolute returns shows a positive 
auto-correlation over a long period of time and decays roughly as a power- 
law with an exponent between 0.1 and 0.3 [9, 10, 11]. Therefore high 
volatility events tend to cluster in time, large changes tend to be followed 
by large changes, and analogously for small changes. 

2.3 The Efficient Market Hypothesis (EMH) 

A debatable issue in financial econometrics is whether the market is "efficient" 
or not. The "efficient" asset market is that in which the information contained 
in past prices is instantly, fully and continually reflected in the asset's current 
price. The EMH was proposed by Eugene Fama in his Ph.D. thesis work in the 
1960's, in which he argued that in an active market that consists of intelligent 
and well-informed investors, securities would be fairly priced and reflect all 
the information available. Till date there continues to be disagreement on the 
degree of market efficiency. The three widely accepted forms of the EMH are: 

• "Weak" form: all past market prices and data are fully reflected in securi- 
ties prices and hence technical analysis is of no use. 

• "Semistrong" form: all publicly available information is fully reflected in 
securities prices and hence fundamental analysis is of no use. 

• "Strong" form: all information is fully reflected in securities prices and 
hence even insider information is of no use. 

The EMH has provided the basis for much of the financial market research. 
In the early 1970's, evidence seemed to be available, supporting the the EMH: 
the prices followed a random walk and the predictable variations in returns, if 
any, turned out to be statistically insignificant. While most of the studies in 
the 1970's concentrated mainly on predicting prices from past prices, studies 
in the 1980's looked at the possibility of forecasting based on variables such 
as dividend yield, too, see e.g. Ref. [12]. Several later studies also looked at 
things such as the reaction of the stock market to the announcement of various 
events such as takeovers, stock splits, etc. In general, results from event studies 
typically showed that prices seemed to adjust to new information within a day 
of the announcement of the particular event, an inference that is consistent 
with the EMH. In the 1990's, some studies started looking at the deficiencies of 
asset pricing models. The accumulating evidences suggested that stock prices 
could be predicted with a fair degree of reliability. To understand whether 
predictability of returns represented "rational" variations in expected returns 
or simply arose as "irrational" speculative deviations from theoretical values, 
further studies have been conducted in the recent years. Researchers have 
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now discovered several stock market "anomalies" that seem to contradict the 
EMH. Once an anomaly is discovered, in principle, investors attempting to 
profit by exploiting such an inefficiency should result in the disappearance 
of the anomaly. In fact, many such anomalies that have been discovered via 
back-testing, have subsequently disappeared or proved to be impossible to 
exploit due to high costs of transactions. 

We would like to mention the paradoxical nature of efficient markets: if 
every practitioner truly believed that a market was efficient, then the mar- 
ket would not have been efficient since no one would have then analyzed the 
behavior of the asset prices. In fact, efficient markets depend on market par- 
ticipants who believe the market is inefficient and trade assets in order to 
make the most of the market inefficiency. 

2.4 Are there any long-time correlations? 

Two of the most important and simple models of probability theory and fi- 
nancial econometrics are the random walk and the Martingale theory. They 
assume that the future price changes only depend on the past price changes. 
Their main characteristic is that the returns are uncorrelated. But are they 
truly uncorrelated or are there long-time correlations in the financial time- 
series? This question has been studied especially since it may lead to deeper 
insights about the underlying processes that generate the time-series [13]. 

Next we discuss two measures to quantify the long-time correlations, and 
study the strength of trends: the R/S analysis to calculate the Hurst exponent 
and the detrended fluctuation analysis [14]. 

Hurst Exponent from R/S Analysis 

In order to measure the strength of trends or "persistence" in different pro- 
cesses, the rescaled range (R/S) analysis to calculate the Hurst exponent can 
be used. One studies the rate of change of the rescaled range with the change 
of the length of time over which measurements are made. We divide the time- 
series £t of length T into N periods of length t, such that Nt — T. For each 
period i = 1,2, ...,N, containing r observations, the cumulative deviation is 

ir 

X(r)= Yl &-<0 T ). (!) 

t=(»-l)T+l 

where (£) T is the mean within the time-period and is given by 

(Or = 7 E ( 2 ) 
t=(i-l)r+l 

The range in the i-th time period is given by R(t) = maxX(r) — minX(r), 
and the standard deviation is given by 



Financial time-series analysis: A brief overview 



7 



1 
2 



1 



(3) 



i= (i_i) r+ i 



Then R(t)/S{t) is asymptotically given by a power-law 

R(t)/S(t) = K r H , 



(4) 



where k is a constant and the Hurst exponent. In general, "persistent" 
behavior with fractal properties is characterized by a Hurst exponent 0.5 < 
H < 1, random behavior by H = 0.5 and "anti-persistent" behavior by < 
H < 0.5. Usually Eq. (4) is rewritten in terms of logarithms, \og(R/S) = 
H\og(r) + log(ft), and the Hurst exponent is determined from the slope. 

Detrended Fluctuation Analysis (DFA) 

In the DFA method the time-series £t of length T is first divided into N 
non-overlapping periods of length r, such that Nt — T. In each period i = 
1,2,...,N the time-series is first fitted through a linear function z t = at + b, 
called the local trend. Then it is detrended by subtracting the local trend, in 
order to compute the fluctuation function, 



The function F(t) is re-computed for different box sizes t (different scales) 
to obtain the relationship between F(r) and r. A power-law relation between 
F(t) and the box size r, F(t) <~ r°, indicates the presence of scaling. The 
scaling or "correlation exponent" a quantifies the correlation properties of the 
signal: if a = 0.5 the signal is uncorrelated (white noise); if a > 0.5 the signal 
is anti-correlated; if a < 0.5, there are positive correlations in the signal. 

Comparison of different time-series 

Besides comparing empirical financial time-series with randomly generated 
time-series, here we make the comparison with multivariate spatiotcmporal 
time-series drawn from coupled map lattices and the multiplicative stochastic 
process GARCH(1,1) used to model financial time-series. 

Multivariate spatiotemporal time-series drawn from coupled map lattices 

The concept of coupled map lattices (CML) was introduced as a simple model 
capable of displaying complex dynamical behavior generic to many spatiotem- 
poral systems [15, 16]. Coupled map lattices are discrete in time and space, 
but have a continuous state space. By changing the system parameters, one 
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(5) 
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can tune the dynamics toward the desired spatial correlation properties, many 
of them already studied and reported [16]. We consider the class of diffusively 
coupled map lattices in one-dimension, with sites i = 1, 2, . . . , n, of the form 

y* +1 = (1 - e)f(yl) + e[ f{y\ +1 ) + /(yj" 1 ) ]/2 , (6) 

where f(y) = 1 — ay 2 is the logistic map whose dynamics is controlled by 
the parameter a and the parameter e measures the coupling strength be- 
tween nearest-neighbor sites. We generally choose periodic boundary condi- 
tions, x(n+l) = x(l). In the numerical computations reported by Chakraborti 
and Santhanam [17], a coupled map lattice with n = 500 was iterated starting 
from random initial conditions, for p = 5 x 10 7 time steps, after discarding 10 5 
transient iterates. As the parameters a and e are varied, the spatiotemporal 
map displays various dynamical features like frozen random patterns, pattern 
selection, space-time intermittency, and spatiotemporal chaos [16]. In order to 
study the coupled map lattice dynamics found in the regime of spatiotemporal 
chaos, where correlations are known to decay rather quickly as a function of 
the lattice site, the parameters were chosen asa= 1.97 and e = 0.4. 

Multiplicative stochastic process GARCH(1,1) 

Considerable interest has been in the application of ARCH/GARCH mod- 
els to financial time-series, which exhibit periods of unusually large volatility 
followed by periods of relative tranquility. The assumption of constant vari- 
ance or "homoskedasticity" is inappropriate in such circumstances. A stochas- 
tic process with auto-regressional conditional "heteroskedasticity" (ARCH) is 
actually a stochastic process with "non-constant variances conditional on the 
past but constant unconditional variances" [18]. The ARCH(p) process is de- 
fined by the equation 

of = a + a- L x 2 _ 1 + ... + a p x 2 _ p , (7) 

where the {cto, cti, ...a p } are positive parameters and Xt is a random variable 
with zero mean and variance of, characterized by a conditional probability 
distribution function ft(x), which may be chosen as Gaussian. The nature of 
the memory of the variance a 2 is determined by the parameter p. 

The generalized ARCH process GARCH(p, q) was introduced by Bollerslev 
[19] and is defined by the equation 

of = a + a 1 x 2 _ 1 + ... + a q x 2 _ q + Picr 2 ^ + ... + f3 p cr 2 _ p , (8) 

where ...,/3 p } are additional control parameters. 

The simplest GARCH process is the GARCH(1,1) process, with Gaussian 
conditional probability distribution function , 



2 i 2 i a 2 



(9) 
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The random variable x t can be written in term of a t denning x t = r] t a t , where 
r] t is a random Gaussian process with zero mean and unit variance. One can 
rewrite Eq. 9 as a random multiplicative process 

o 2 t =a + + ^)o 2 t _ Y . (10) 

DFA analysis of auto-correlation function of absolute returns 

The analysis of financial correlations was done in 1997 by the group of H.E. 
Stanley [10]. The correlation function of the financial indices of the New York 
stock exchange and the S&P 500 between January 1984 and December, 1996 
were analyzed at one minute intervals. The study confirmed that the auto- 
correlation function of the returns fell off exponentially but the absolute value 
of the returns did not. Correlations of the absolute values of the index returns 
could be described through two different power laws, with crossover time 
t x w 600 minutes, corresponding to 1.5 trading days. Results from power 
spectrum analysis and DFA analysis were found to be consistent. The power 
spectrum analysis of Fig. 4 yielded exponents f3\ = 0.31 and #2 = 0.90 for 
/ > fx and / < fx , respectively. This is consistent with the result that 
a = (l + /3)/2 and t x ~ l//x, as obtained from detrended fluctuation analysis 
with exponents ai = 0.66 and 0*2 = 0.93 for t <t x and t > t x , respectively. 




10 5 10 J 1CT 3 1CT 2 icr 1 10" 10 2 10 3 10 4 10 s 

f [min" 1 ] t [min] 



Fig. 4. Power spectrum analysis (left) and detrended fluctuation analysis (right) of 
auto-correlation function of absolute returns, from Ref. [10]. 



Numerical Comparison 

In order to provide an illustrative example, in Fig. 5 a comparison among 
various analysis techniques and process is presented, while the values of the 
exponents of the Hurst and DFA analyzes are listed in Table 1. For the numeri- 
cal computations reported by Chakraborti and Santhanam [17], the parameter 
values chosen were a = 0.00023, a.\ = 0.09 and f3 = 0.01. 



10 A. Chakraborti, M. Patriarca, and M.S. Santhanam 




Log(i) Log(r) 



Fig. 5. R/S (left) and DFA (right) analyses: Random time-series, 3000 time 
steps (black solid line); multivariate spatiotemporal time-series drawn from cou- 
pled map lattices with parameters a = 1.97 and e = 0.4, 3000 time steps (black 
filled up-triangles) ; multiplicative stochastic process GARCH(1,1) with parameters 
Qo = 0.00023, Qi = 0.09 and /3q = 0.01, 3000 time steps (red filled squares); Return 
time-series of the S&P500 stock index, 8938 time steps (blue filled circles). 



Table 1. Hurst and DFA exponents. 



Process 


Hurst exponent 


DFA exponent 


Random 


0.50 


0.50 


Chaotic (CML) 


0.46 


0.48 


GARCH(1,1) 


0.63 


0.51 


Financial Returns 


0.99 


0.51 



3 Random Matrix methods in time-series analysis 

The R/S and the detrended fluctuation analysis considered in the previous 
section are suitable for analyzing univariate data. Since the stock-market data 
are essentially multivariate time-series data, it is worth constructing a corre- 
lation matrix to study its spectra and contrasting it with random multivari- 
ate data from coupled map lattice. Empirical spectra of correlation matrices, 
drawn from time-series data, are known to follow mostly random matrix the- 
ory (RMT) [20]. 

3.1 Correlation matrix and Eigenvalue density 
Correlation matrix 

Financial Correlation matrix 

If there are N assets with a price Pi{t) for asset i at time t, the logarithmic 
return of stock i is Ti (t) — In Pj (t) — In Pj (t — 1) . A sequence of such values for 
a give period of time forms the return vector r, . In order to characterize the 
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synchronous time evolution of stocks, one defines the equal time correlation 
coefficients between stocks i and j, 



Pii = [{nrj) (^(r,)]/ ^/[(rt) - (nmr]) - (r 3 )2], (11) 

where (...) indicates a time average over the trading days included in the 
return vectors. The correlation coefficients pij form an N x N matrix, with 
— 1 < Pij < 1- If pij = 1, the stock price changes are completely correlated; 
if p^ = 0, the stock price changes are uncorrelated and if pij = — 1, then the 
stock price changes are completely anti-correlated. 

Correlation matrix from spatiotemporal series from coupled map lattices 

Consider a time-series of the form z'(x,t), where x = 1,2, ...,n and t = 
1,2, ... ,p denote the discretized space and time. In this way, the time-series 
at every spatial point is treated as a different variable. We define 

z(x,t)=[z'(x,t)-(z'(x))]/a(x), (12) 

as the normalized variable, with the brackets (.) representing a temporal aver- 
age and a(x) the standard deviation of z' at position x. Then, the equal-time 
cross-correlation matrix can be written as 

S x ,x> = (z(x,t) z{x',t)) , x,x' = 1,2, . . . ,n. (13) 

This correlation matrix is symmetric by construction. In addition, a large 
class of processes is translationally invariant and the correlation matrix will 
possess the corresponding symmetry. We use this property for our correlation 
models in the context of coupled map lattices. In time-series analysis, the 
averages (.) have to be replaced by estimates obtained from finite samples. 
We use the maximum likelihood estimates, i.e., (a(t)) w |X)t=i a W- These 
estimates contain statistical uncertainties which disappear for p — > oo. Ideally 
we require p> n to have reasonably correct correlation estimates. 



Eigenvalue Density 

The interpretation of the spectra of empirical correlation matrices should be 
done carefully in order to distinguish between system specific signatures and 
universal features. The former ones express themselves in a smoothed level 
density, whereas the latter ones are usually represented by the fluctuations 
on top of such a smooth curve. In time-series analysis, matrix elements are 
not only prone to uncertainties such as measurement noise on the time-series 
data, but also to the statistical fluctuations due to finite sample effects. When 
characterizing time series data in terms of RMT, we are not interested in these 
sources of fluctuations, which are present on every data set, but we want to 
identify the significant features which would be shared, in principle, by an 
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"infinite" amount of data without measurement noise. The eigenfunctions of 
the correlation matrices constructed from such empirical time-series carry the 
information contained in the original time-series data in a "graded" manner 
and provide a compact representation for it. Thus, by applying an approach 
based on RMT, we try to identify non-random components of the correlation 
matrix spectra as deviations from RMT predictions [20] . 

We now consider the eigenvalue density, studied in applications of RMT 
methods to time-series correlations. Let Af(\) be the integrated eigenvalue 
density, giving the number of eigenvalues smaller than a given A. The eigen- 
value or level density, p(X) = dj\f(\)/d\, can be obtained assuming a random 
correlation matrix [21]. Results are found to be in good agreement with the 
empirical time-series data from stock market fluctuations [22]. From RMT 
considerations, the eigenvalue density for random correlations is given by 

Prmt(A) = [Q/(27rA)]v/(A ma;E - A)(A - X~) . (14) 

Here Q = N/T is the ratio of the number of variables to the length of each time- 
series, while X m in = 1 + 1/ Q — 2 \J\jQ and \ max = l + l/Q + 2 \J\jQ represent 
the minimum and maximum eigenvalues of the random correlation matrix. 
The presence of correlations in the empirical correlation matrix produces a 
violation of this form of eigenvalue density, for a certain number of dominant 
eigenvalues, often corresponding to system specific information in the data. As 
examples, Fig. 6 shows the eigenvalue densities for S&P500 data and for the 
chaotic data from coupled map lattice are shown: the curves are qualitatively 
different from the form of Eq. (14). 




x X 

Fig. 6. Spectral density for multivariate spatiotemporal time-series drawn from 
coupled map lattices (left) and eigenvalue density for the return time-series of the 
S&P500 stock market data, 8938 time steps (right). 
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3.2 Earlier estimates and studies using Random Matrix Theory 
(RMT) 

Laloux et al. [23] showed that results from RMT were useful to understand 
the statistical structure of the empirical correlation matrices appearing in 
the study of price fluctuations. The empirical determination of a correlation 
matrix is a difficult task. If one considers N assets, the correlation matrix 
contains N(N — l)/2 mathematically independent elements, which must be 
determined from N time-series of length T. If T is not very large compared 
to N, then generally the determination of the covariances is noisy, and there- 
fore the empirical correlation matrix is to a large extent random. The smallest 
eigenvalues of the matrix are the most sensitive to this "noise" . But the eigen- 
vectors corresponding to these smallest eigenvalues determine the minimum 
risk portfolios in Markowitz's theory. It is thus important to distinguish "sig- 
nal" from "noise" or, in other words, to extract the eigenvectors and eigenval- 
ues of the correlation matrix, containing real information (which is important 
for risk control), from those which do not contain any useful information and 
are unstable in time. It is useful to compare the properties of an empirical 
correlation matrix to a "null hypothesis" — a random matrix which arises for 
example from a finite time-series of strictly uncorrelated assets. Deviations 
from the random matrix case might then suggest the presence of true infor- 
mation. The main result of the study was a remarkable agreement between 
theoretical predictions, based on the assumption that the correlation matrix 
is random, and empirical data concerning the density of eigenvalues. This is 
shown in Fig. 7 for the time-series of the different stocks of the S&P 500 (or 
other stock markets). 




Fig. 7. Eigenvalue spectrum of the correlation matrices, adapted from Ref. [23]. 
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Cross-correlations in financial data were also studied by Plerou et al. 
[24], who analyzed price fluctuations of different stocks through RMT. Us- 
ing two large databases, they calculated cross-correlation matrices of returns 
constructed from: (i) 30-min returns of 1000 US stocks for the period 1994- 
95; (ii) 30-min returns of 881 US stocks for the period 1996-97; (iii) 1-day 
returns of 422 US stocks for the period 1962-96. They tested the statistics of 
the eigenvalues Ai of cross-correlation matrices against a "null hypothesis" and 
found that a majority of the eigenvalues of the cross-correlation matrices were 
within the RMT bounds (A m i„, A maa; ) defined above for random correlation 
matrices. Furthermore, they analyzed the eigenvalues of the cross-correlation 
matrices within the RMT bound for universal properties of random matrices 
and found good agreement with the results for the Gaussian orthogonal en- 
semble (GOE) of random matrices, implying a large degree of randomness in 
the measured cross-correlation coefficients. It was found that: (i) the distri- 
bution of eigenvector components, for the eigenvectors corresponding to the 
eigenvalues outside the RMT bound, displayed systematic deviations from the 
RMT prediction; (ii) such "deviating eigenvectors" were stable in time; (iii) 
the largest eigenvalue corresponded to an influence common to all stocks; (iv) 
the remaining deviating eigenvectors showed distinct groups, whose identities 
corresponded to conventionally-identified business sectors. 

4 Approximate Entropy method in time-series analysis 

The Approximate Entropy (ApEn) method is an information theory-based es- 
timate of the complexity of a time series introduced by S. Pincus [25], formally 
based on the evaluation of joint probabilities, in a way similar to the entropy 
of Eckmann and Ruellc. The original motivation and main feature, however, 
was not to characterize an underlying chaotic dynamics, rather to provide a 
robust model-independent measure of the randomness of a time series of real 
data, possibly — as it is usually in practical cases — from a limited data set 
affected by a superimposed noise. ApEn has been used by now to analyze data 
obtained from very different sources, such as digits of irrational and transcen- 
dental numbers, hormone levels, clinical cardiovascular time-series, anesthesia 
depth, EEG time-series, and respiration in various conditions. 

Given a sequence of N numbers {u(j)} = {u(l), u(2), . . . , u(N)}, with 
equally spaced times tj + i — tj = At = const, one first extracts the sequences 
with embedding dimension m, i.e., x(i) = {u(i),u(i + 1), . . . , u(i + m — 1)}, 
with 1 < i < N — m+1. The ApEn is then computed as 

ApEn = <£ m (r)-<2> m+1 (r), (15) 

where r is a real number representing a threshold distance between series, and 
the quantity <P m (r) is defined as 
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N-m+l 

<P m (r) = (In[CT(r)]) - £ ln[CT(r)]/(JV - m + 1) . (16) 

Here C™(r) is the probability that the series x(i) is closer to a generic series 
x (j) (j < N — m + 1) than the threshold r, 

C"(r) =tf[d(i,j) < r]/(N-m+l), (17) 

with A/"[d(i, j) < r] the number of sequences x(j) close to x(i) less than r. 
As definition of distance between two sequences, the maximum difference (in 
modulus) between the respective elements is used, 

d(i,j)= max (\u(j + k - 1) - u(i + k - 1)|) . (18) 

k— l,2,...,m 

Quoting Pincus and Kalman [26], ". . . ApEn measures the logarithmic fre- 
quency that runs of patterns that are close (within r) for m contiguous ob- 
servations remain close (within the same tolerance width r) on the next in- 
cremental comparison" . Comparisons are intended to be done at fixed m and 
r, the general ApEn(m,r) being in fact a family of parameters. 

In economics, the ApEn method has been shown to be a reliable estimate 
of the efficiency of market [25, 26, 27] and has been applied to various eco- 
nomically relevant events. For instance, the ApEn computed for the S&P 500 
index has shown a drastic increase in the two-week period preceding the stock 
market crash of 1987. Just before the Asian crisis of November 1997, the ApEn 
computed for the Hong Kong's Hang Seng index, from 1992 to 1998, assumes 
its highest values. More recently, a broader investigation carried out for vari- 
ous countries through the ApEn by Oh, Kim, and Eom, revealed a systematic 
difference between the efficiencies of the markets between the period before 
and after the the Asian crisis [28] . 
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